Search CORE

111 research outputs found

Relative Information Loss in the PCA

Author: Geiger Bernhard C.
Kubin Gernot
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/07/2012
Field of study

In this work we analyze principle component analysis (PCA) as a deterministic input-output system. We show that the relative information loss induced by reducing the dimensionality of the data after performing the PCA is the same as in dimensionality reduction without PCA. Finally, we analyze the case where the PCA uses the sample covariance matrix to compute the rotation. If the rotation matrix is not available at the output, we show that an infinite amount of information is lost. The relative information loss is shown to decrease with increasing sample size.Comment: 9 pages, 4 figure; extended version of a paper accepted for publicatio

arXiv.org e-Print Archive

Crossref

Semi-supervised cross-entropy clustering with information bottleneck constraint

Author: Geiger Bernhard C.
Śmieja Marek
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

In this paper, we propose a semi-supervised clustering method, CEC-IB, that models data with a set of Gaussian distributions and that retrieves clusters based on a partial labeling provided by the user (partition-level side information). By combining the ideas from cross-entropy clustering (CEC) with those from the information bottleneck method (IB), our method trades between three conflicting goals: the accuracy with which the data set is modeled, the simplicity of the model, and the consistency of the clustering with side information. Experiments demonstrate that CEC-IB has a performance comparable to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but is faster, more robust to noisy labels, automatically determines the optimal number of clusters, and performs well when not all classes are present in the side information. Moreover, in contrast to other semi-supervised models, it can be successfully applied in discovering natural subgroups if the partition-level side information is derived from the top levels of a hierarchical clustering

arXiv.org e-Print Archive

Jagiellonian Univeristy Repository

Information-Preserving Markov Aggregation

Author: Geiger Bernhard C.
Temmel Christoph
Publication venue
Publication date: 01/01/2013
Field of study

We present a sufficient condition for a non-injective function of a Markov chain to be a second-order Markov chain with the same entropy rate as the original chain. This permits an information-preserving state space reduction by merging states or, equivalently, lossless compression of a Markov source on a sample-by-sample basis. The cardinality of the reduced state space is bounded from below by the node degrees of the transition graph associated with the original Markov chain. We also present an algorithm listing all possible information-preserving state space reductions, for a given transition graph. We illustrate our results by applying the algorithm to a bi-gram letter model of an English text.Comment: 7 pages, 3 figures, 2 table

arXiv.org e-Print Archive

VU Research Portal

Information Loss and Anti-Aliasing Filters in Multirate Systems

Author: Geiger Bernhard C.
Kubin Gernot
Publication venue
Publication date: 01/01/2014
Field of study

This work investigates the information loss in a decimation system, i.e., in a downsampler preceded by an anti-aliasing filter. It is shown that, without a specific signal model in mind, the anti-aliasing filter cannot reduce information loss, while, e.g., for a simple signal-plus-noise model it can. For the Gaussian case, the optimal anti-aliasing filter is shown to coincide with the one obtained from energetic considerations. For a non-Gaussian signal corrupted by Gaussian noise, the Gaussian assumption yields an upper bound on the information loss, justifying filter design principles based on second-order statistics from an information-theoretic point-of-view.Comment: 12 pages; a shorter version of this paper was published at the 2014 International Zurich Seminar on Communication

arXiv.org e-Print Archive

Repository for Publications and Research Data